Language model acquisition from a text corpus for speech understanding

نویسندگان

  • Tatsuo Matsuoka
  • Robert Hasson
  • Michael Barlow
  • Sadaoki Furui
چکیده

Speech understanding can be viewed as a problem of translating input natural language of speech recognition results into output semantic language. This paper describes automatic acquisition of a language model for translating natural language into semantic language from a text corpus using a stochastic method. The method estimates co-occurrence probabilities of input and output grammar rules as a translation language model. Since the amount of texts is limited, estimating a reliable language model is difficult. Therefore, we propose a method of concisely modeling input and output grammars in order to estimate a reliable translation model. Our method is shown to be effective by experiments using the ARPA ATIS task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multipass algorithm for acquisition of salient acoustic morphemes

We are interested in spoken language understanding within the domain of automated telecommunication services. Our current methodology involves training statistical language models from large annotated corpora for recognition and understanding. Since the transcribing of large speech corpora is a resource consuming task, we are motivated to exploit speech without transcriptions. In particular, we...

متن کامل

First steps in building a large vocabulary continuous speech recognition system for Vietnamese

This paper presents an overview of our activities for building a Large Vocabulary Continuous Speech Recognition (LVCSR) system for Vietnamese implemented at CLIPS-IMAG Laboratory (France) and International Research Center MICA (Vietnam). Firstly, a new methodology for fast text corpora acquisition for minority languages which has been applied to Vietnamese is proposed. Secondly, the first resul...

متن کامل

پیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی

Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...

متن کامل

Variation Sets Facilitate Artificial Language Learning

Variation set structure — partial alignment of successive utterances in child-directed speech — has been shown to correlate with progress in the acquisition of syntax by children. The present study demonstrates that arranging a certain proportion of utterances in a training corpus in variation sets facilitates word segmentation and phrase structure learning in miniature artificial languages by ...

متن کامل

Integrated Recognition and Interpretation of Speech for a Construction Task Domain

The development of speech processing front-ends for the controlling of complex systems has received more and more interest during the last years. Usually this task is divided in two subtasks. The speech recogniser records the utterance and puts out a corresponding text, and the speech understanding module tries to extract an internal representation of the meaning of the utterance. As shown in F...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996